*####### This is supplementary material for the article: Standard Jones and Modifieds Jones: A earnings management tutorial, each step is discussed and explained in the article

*This code was built to run each model individually.
*So, the steps that are repeated between the models must be followed carefully.
*If you operationalize all the code at once, "variable already defined" errors will be returned.
*This code uses Stata default commands and other additional implementations.

*The following Stata default commands: “doedit”, “use”, "sort", "xtset", "by", "bys", "gen", "l." and "d.". If you have questions about these commands, please consult Stata User’s guide (https://www.stata.com/bookstore/users-guide/).” 
*The following additional implementations: "winsor2", "asreg" and "asdoc"
*On first code use, type the following on the Stata command window to install supplementary programs:

ssc install winsor2
ssc install asreg
ssc install asdoc


*Before implementing operationalization, you will need to load the database in the .dta file and the .do file. 
*To open the files you use the commands "use" and "doedit", respectively.

*doedit ".do file directory" 
doedit "EM_tutorial_do.do"
*use ".dta file directory"

use "EM_tutorial_data.dta" , clear

*Winsorizing the variables
winsor2 Assets Current_assets Cash Account_Reicevables Inventories ppe Current_liabilities STD noncurrent_assets Depreciation net_income CFO , replace cuts(1 99)
*Preparing data to generate variables, sort the data in ascendenting order.


sort Firm_id Year

*Define panel and time variables.
xtset Firm_id Year, year

* Standard Jones (1991)Step 1: generate variables to calculating total accruals.
by Firm_id : gen lagCurrent_assets = l.Current_assets

by Firm_id : gen lag_Cash = l.Cash

by Firm_id : gen lag_Current_liabilities = l.Current_liabilities

by Firm_id : gen lag_Assets = l.Assets

gen delta_current_assets = d.Current_assets

gen delta_cash = d.Cash

gen delta_current_liabilities = d.Current_liabilities

gen TotalAccruals_t = delta_current_assets - delta_cash - delta_current_liabilities - Depreciation

by Firm_id : gen lag_Revenue = l.Revenue

gen delta_Revenue = d.Revenue

*Standard Jones (1991)Step 2: generate scalled variables.

gen TA_jones1991 = TotalAccruals_t / lag_Assets
gen InverseAT_jones1991 = 1 / lag_Assets
gen delta_Revenue_jones1991 = delta_Revenue / lag_Assets
gen PPE_jones1991 = ppe / lag_Assets


*#ATTENTION: Although there are four ways to run"step 3", you will implement only one of them. In the main text we discussed this point. 

*The literature is not parsimonious about the minimum number of observations required or the proportion of observations per variable.
*Some studies (e.g.,Austin & Steyerberg, 2015) suggest that two observations per variable would be sufficient, while others (e.g.,Hanley, 2016) argue that a much larger number is needed.
*Our suggestion is, as the companies in the same sector are reasonably different, if there are sectors with less than 30 companies(pocket rule), use by condition only the year variable.
*References: 
*Austin, P. C., & Steyerberg, E. W. (2015). The number of subjects per variable required in linear regression analyses. Journal of clinical epidemiology, 68(6), 627-636.
*Hanley, J. A. (2016). Simple and multiple linear regression: sample size considerations. Journal of clinical epidemiology, 79, 112-119.

*First: Standard Jones (1991)Step 3: Run the regressions considering industry sector and year.

*If you did not install asreg, type in the command
ssc install asreg

*#ATTENTION: "by" is used in this code when only one variable is established as a condition for run regressions.
*When there is more than one categorization that condition run regressions, "bys" is used instead of "by".


*#Option 1: By sector and year, trougth the origin
sort Firm_id Year

bys B3_sector Year : asreg TA_jones1991 InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, noconstant se fit

*#Option 2: By sector and year, with constant term
sort Firm_id Year

bys B3_sector Year : asreg TA_jones1991 InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, se fit

*Standard Jones (1991)Step 3: Run the regressions by year.

*#Option 3: By year, trougth the origin
sort Year

by Year : asreg TA_jones1991 InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, noconstant se fit

*#Option 4: By year, with constant term
sort Year

by Year : asreg TA_jones1991 InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, se fit

*Standard Jones (1991) Step 4: Generate the accruals, by absolute residuals value.
sort Firm_id Year

*Option 1: If step 3 option 4 is choosed.
gen wc_abs_DACC = abs(_residuals)

*clear unused variables from the environment, if option 4 is choosed.
drop _residuals _fitted _se_PPE_jones1991 _se_delta_Revenue_jones1991 _se_InverseAT_jones1991 _b_PPE_jones1991 _b_delta_Revenue_jones1991 _b_InverseAT_jones1991 _adjR2 _R2 _Nobs _b_cons _se_cons

*Option 2: If step 3 option 3 is choosed.
gen nc_abs_DACC = abs(_residuals)
drop _residuals _fitted _se_PPE_jones1991 _se_delta_Revenue_jones1991 _se_InverseAT_jones1991 _b_PPE_jones1991 _b_delta_Revenue_jones1991 _b_InverseAT_jones1991 _adjR2 _R2 _Nobs


*Generate a .doc file.
asdoc by Year B3_sector, sort : summarize wc_abs_DACC nc_abs_DACC, detail


*#ATTENTION: As reg create many variables of each observation.
*If you continue to run this code without doing this, the next times you use, asreg not work, as there will already be other variables with the same name that it creates by default.
*To continue drop or rename the variables: _residuals _fitted _se_PPE_jones1991 _se_delta_Revenue_jones1991 _se_InverseAT_jones1991 _b_PPE_jones1991 _b_delta_Revenue_jones1991 _b_InverseAT_jones1991 _adjR2 _R2 _Nobs

drop _residuals _fitted _se_PPE_jones1991 _se_delta_Revenue_jones1991 _se_InverseAT_jones1991 _b_PPE_jones1991 _b_delta_Revenue_jones1991 _b_InverseAT_jones1991 _adjR2 _R2 _Nobs _b_cons _se_cons


*############################## FINISH Standard Jones.



*Modified Jones (Dechow, Sloan, and Sweeney 1995)
* Modified Jones (1995)Step 1: generate variables to calculating total accruals.

sort Firm_id Year

by Firm_id : gen lag_Current_assets= l.Current_assets

by Firm_id : gen lag_Cash = l.Cash

by Firm_id : gen lag_Current_liabilities= l.Current_liabilities

gen delta_current_assets = d.Current_assets

gen delta_cash =d.Cash

gen delta_current_liabilities = d.Current_liabilities

by Firm_id : gen lag_STD = l.STD

gen delta_STD = d.STD

gen MdfTotalAccruals_t = (delta_current_assets - delta_cash - delta_current_liabilities +  delta_STD - Depreciation)

* Modified Jones (1995)Step 2: generate scaled total accruals.

gen MdfjonesTACC = MdfTotalAccruals_t / lag_Assets

*#ATTENTION: Although there are four ways to run"step 3", you will implement only one of them. In the main text we discussed this point.
* Modified Jones (1995)Step 3: run the Standard Jones (1991) regressions.

*Option 1: By sector and year, trougth the origin
sort Firm_id Year

bys B3_sector Year : asreg MdfjonesTACC InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, noconstant se fit

*#Option 2:By sector and year, with constant term
sort Firm_id Year
bys B3_sector Year : asreg MdfjonesTACC InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, se fit

*Modified Jones (1995)Step 3: Run the regressions by year.
*#ATTENTION: Although there are two "step 3", you will implement only one of them. In the main text we discussed this point. 

*#Option 3: By year, run trougth the origin
sort Year
by Year : asreg MdfjonesTACC InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, noconstant se fit

*#Option 4: By year, run with constant term
sort Year
by Year : asreg MdfjonesTACC InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, se fit

*Modified Jones (1995)Step 4: Generate variables necessary to normal accruals calculation.
sort Firm_id Year

by Firm_id : gen lag_Account_Reicevables = l.Account_Reicevables

gen delta_AR = Account_Reicevables - lag_Account_Reicevables

gen scaled_delta_AR = delta_AR / lag_Assets


*Modified Jones (1995) Step 5: Generate variable from calculate normal accruals.
*To constant considering

gen MdfNAit=_b_cons+(_b_InverseAT_jones1991*InverseAT_jones1991)+(_b_delta_Revenue_jones1991*(delta_Revenue_jones1991 - scaled_delta_AR ) ) + ( _b_PPE_jones1991 *PPE_jones1991 ) + _residuals

*When use no constant:

gen MdfNAit=_b_cons+(_b_InverseAT_jones1991*InverseAT_jones1991)+(_b_delta_Revenue_jones1991*(delta_Revenue_jones1991 - scaled_delta_AR ) ) + ( _b_PPE_jones1991 *PPE_jones1991 ) + _residuals

*Modified Jones (1995) Step 6: Generate discretionary accruals.

gen MdfDACC = MdfjonesTACC - MdfNAit

gen Mdf_abs_DACC = abs (MdfDACC)

*To continue drop or rename the variables: Nobs _R2 _adjR2 _b_InverseAT_jones1991 _b_delta_Revenue_jones1991 _b_PPE_jones1991 _b_cons _se_InverseAT_jones1991 _se_delta_Revenue_jones1991 _se_PPE_jones1991 _se_cons _fitted _residuals
drop _Nobs _R2 _adjR2 _b_InverseAT_jones1991 _b_delta_Revenue_jones1991 _b_PPE_jones1991 _b_cons _se_InverseAT_jones1991 _se_delta_Revenue_jones1991 _se_PPE_jones1991 _se_cons _fitted _residuals

*################################### FINISH Modified Jones.


*#ATTENTION: Kothari et al.(2005) and PAE (2005) it's a continuity of Modified Jones (Dechow, Sloan, and Sweeney 1995)

sort Firm_id Year
* Modified Jones Model With Return on Asset (Kothari et al., 2005) Step 1: Generate changes in non-cash assets

gen delta_noncash = delta_current_assets - delta_cash

* Modified Jones Model With Return on Asset (Kothari et al., 2005) Step 2: Generate Return on assets variables.

by Firm_id : gen ROA_1 = net_income / Assets

by Firm_id : gen ROA_2 = net_income / lag_Assets

by Firm_id : gen ROA_3 = net_income / ( (Assets + lag_Assets)/2)

* Modified Jones Model With Return on Asset (Kothari et al., 2005) Step 3: Generate lagged Return on assets variables.

by Firm_id : gen lag_ROA_1 = l.ROA_1

by Firm_id : gen lag_ROA_2 = l.ROA_2

by Firm_id : gen lag_ROA_3 = l.ROA_3

*Generate MJRTA variable, just to differentiate being from that model.
gen MdfTotalAccruals_t == (delta_current_assets −  delta_cash −  delta_current_liabilities +  delta_STD −  Depreciation)

gen MJRTA = MdfTotalAccruals_t / lag_Assets

*#ATTENTION: Although there are twelve ways to run Modified Jones Model With Return on Asset "step 4" , you will implement ONLY ONE of them.
*This is because there are at least three definitions of ROA, of which any of the three, or their lags, are used. 
*In addition, according to your database, you will use the sector and year conditions, or year only.In the main text we discussed this point and in 65 code line we explain.

* Modified Jones Model With Return on Asset (Kothari et al., 2005) Step 4: Run the regressions considering ROA by industry sector and year.

*Option 1: By sector and year with ROA 1 definition:
sort Firm_id Year
bys B3_sector   Year   :asreg   MJRTA   InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 ROA_1, se fit

*Option 2: By sector and year with ROA 2 definition:
sort Firm_id Year
bys B3_sector   Year   :asreg   MJRTA   InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 ROA_2, se fit
 
*Option 3: By sector and year with ROA 3 definition
sort Firm_id Year
bys B3_sector Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 ROA_3, se fit

*Option 4: By sector and year with Lagged ROA 1 definition
sort Firm_id Year
bys B3_sector Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 lag_ROA, se fit

*Option 5: By sector and year with Lagged ROA 2 definition
sort Firm_id Year
bys B3_sector Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 lag_ROA_2, se fit

*Option 6: By sector and year with Lagged ROA 3 definition
sort Firm_id Year
bys B3_sector Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 lag_ROA_3, se fit
 
* Modified Jones Model With Return on Asset (Kothari et al., 2005) Step 4: Run the regressions considering ROA by year.

*Option 7: By year with ROA 1 definition:
sort Year
by  Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 ROA_1, se fit

*Option 8: By year with ROA 2 definition:
sort Year
by Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 ROA_2, se fit

*Option 9: By year with ROA 3 definition
sort Year
by Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 ROA_3, se fit

**Option 10: By year with Lagged ROA 1 definition
sort Year
by Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 lag_ROA, se fit

*Option 11: By year with Lagged ROA 2 definition
sort Year
by Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 lag_ROA_2, se fit
sort Year
*Option 12: By year with Lagged ROA 3 definition
sort Year
by Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 lag_ROA_3, se fit

* Modified Jones Model With Return on Asset (Kothari et al., 2005) Step 5: Generate discretionary accruals.

gen ROA_abs_DACC = abs(_residuals)

drop _Nobs _R2 _adjR2 _b_InverseAT_jones1991 _b_delta_Revenue_jones1991 _b_PPE_jones1991 _b_ROA_1 _b_cons _se_InverseAT_jones1991 _se_delta_Revenue_jones1991 _se_PPE_jones1991 _se_ROA_1 _se_cons _fitted _residuals
*################################### FINISH Modified Jones With ROA.


* Modified Jones Model Considering Cash Flow and Reversals (Pae,2005) Step 1: Generate new necessary variables to operacionalize the model.
sort Firm_id Year
by Firm_id : gen lag_CFO = l.CFO
by Firm_id : gen ATPAE = MdfTotalAccruals_t
by Firm_id : gen lag_ATPAE = l.ATPAE

gen CFO_PAE = CFO / lag_Assets

gen lagCFO_PAE = lag_CFO / lag_Assets

gen lag_AT_PAE = lag_ATPAE / lag_Assets

* Modified Jones Model Considering Cash Flow and Reversals (Pae,2005) Step 2: run the regressions considering Cash Flow and reversals.

*Considering industry sector and year.
sort Firm_id Year
bys  B3_sector  Year  :asreg  MdfjonesTACC  InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 CFO_PAE lagCFO_PAE lag_AT_PAE, se fit

*Considering just year.
sort Year
by Year :asreg  MdfjonesTACC  InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 CFO_PAE lagCFO_PAE lag_AT_PAE, se fit

* Modified Jones Model Considering Cash Flow and Reversals (Pae,2005) Step 3: generate discretionary accruals.

gen PAE_abs_DACC = abs(_residuals)

*################################### FINISH PAE.

*For analitic purposes, Table 2 summarizes the accruals measures generated according to each operational model, considering the rolling regressions only for the year, due to sectors with few observations.
*Our summarize is based in following lines when two or more options are possible:
*line 103: by Year : asreg TA_jones1991 InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, se fit
*line 182: by Year : asreg MdfjonesTACC InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991, se fit
*line 285: by Year :asreg MJRTA InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 ROA_1, se fit
*line 323: by Year :asreg  MdfjonesTACC  InverseAT_jones1991 delta_Revenue_jones1991 PPE_jones1991 CFO_PAE lagCFO_PAE lag_AT_PAE, se fit
*As the development subsequent to Jones (1991) used with the constant term, we maintained this pattern for everyone.

*To generate Table 2 information the syntax is:
asdoc by Year, sort : summarize wc_abs_DACC Mdf_abs_DACC ROA_abs_DACC PAE_abs_DACC, detail
